Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ArchiveManager archived entry cache #1022

Open
wants to merge 6 commits into
base: next
Choose a base branch
from

Conversation

bertm
Copy link
Contributor

@bertm bertm commented Dec 1, 2024

The ArchiveManager is keeping track of cached entries in all kinds of different ways, with a lot of code paths being effectively unused.

  • It would try to keep track of whether the archive is still the same as when it was read last time, but there is no code path where that data is actually updated.
  • It would try to determine if ArchiveRestartException should be thrown for the above reason, but there is no code path where this can actually happen.
  • It would try to insert error entries into the cache when the archive file is too big to be cached, but the net effect is the same as when the entry would not be cached in the first place (it always returns a null result when queried).
  • It would keep track of cached items in two different ways, just to be able to unnecessarily clear the cache (and potentially leak memory while at it, as everything was referencing everything else).
  • It would use FreenetURI as a cache key, which is mutable and could break the cache when changed.

Move the cache implementation to a dedicated class that has proper unit tests, and remove all the code paths that were effectively no-op (ignoring a few log statements).

I would suggest to review this commit by commit. The first two commits contain the bulk of the work (their commit messages explain what is happening), the remaining four commit are simpler after the fact cleanups.

bertm added 6 commits December 1, 2024 17:18
The `lastSize` and `lastHash` on the ArchiveStoreContext appear to keep
track of the last observed size and hash for an archive data bucket, but
they are only ever written by the ArchiveManager when they already have
a value - which they never do.

Therefore, there are no code paths for ArchiveRestartedException to be
thrown. This has been the case for at least since 2005, and apparently
it works fine this way.

Inline the default values of these fields in ArchiveManager and prune
the now unreachable code.
Remove the double bookkeeping of archive entries in ArchiveManager and
ArchiveStoreContext and replace it with a central cache implementation.
This simplifies the locking (for which there were instructions in the
comments, but these were ignored in the implementation) and reduces the
number of arguments being passed around.

This also removes the "remove all cached items for an archive" logic
that would likely lead to memory leaks as not all references were
consistently released, and fixes some potential race conditions in
claiming the buckets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant